ML-As-4

Problem 1: Neural Networks (20 pts)

Consider a 3-layer fully connected neural network with the following architecture:

Input layer: n = 4 neurons
Hidden layer: m = 3 neurons using a custom activation function $f (x) = R e L U (x) + s i n (x)$
Output layer: k = 2 neurons using a softmax activation function $σ (z_{i}) = \frac{e_{i}^{z}}{\sum_{j} e^{z_{j}}}$

The network parameters (weights and biases) are given as:

$W_{1} \in R^{3 \times 4}$ and $b_{1} \in R^{3}$ for the hidden layer.
$W_{2} \in R^{2 \times 3}$ and $b_{2} \in R^{2}$ for the output layer.

Given the input vector $x \in R^{4}$ and target output $y \in R^{2}$ . Define the loss function as cross-entropy loss:

L o s s = - \sum_{i = 1}^{k} y_{i} \log ({\hat{y}}_{i})

where $\hat{y}$ is the output after the softmax activation.

Q1

1. Derive the equations for the forward pass through the network, including both the hidden and output layers. (3 pts)

Forward pass through the network:

Hidden layer:

z^{(1)} = W_{1} x + b_{1}

h = f (z^{(1)}) = R e L U (z^{(1)}) + s i n (z^{(1)})

Output layer:

z^{(2)} = W_{2} h + b_{2}

\hat{y} = σ (z^{(2)}) = \frac{e^{z^{(2)}}}{\sum_{j} e^{z_{j}^{(2)}}}

Loss:

\hat{y} i = \frac{e^{z_{i}^{(2)}}}{\sum {j = 1}^{2} e^{z_{j}^{(2)}}}

2. Calculate the outputs Z₁, H, Z₂, and ŷ explicitly for a given input $x = [1, - 1, 0.5, 2]^{T}$ and the following initial weights and biases:

W_{1} = (\begin{matrix} 0.1 & - 0.2 & 0.3 & 0.4 \\ 0.5 & - 0.3 & 0.1 & - 0.2 \\ 0.4 & 0.2 & - 0.5 & 0.3 \end{matrix}), b_{1} = (\begin{matrix} 0.1 \\ - 0.1 \\ 0.05 \end{matrix})

W_{2} = (\begin{matrix} - 0.3 & 0.2 & 0.1 \\ 0.4 & - 0.5 & 0.3 \end{matrix}), b_{2} = (\begin{matrix} 0.05 \\ - 0.05 \end{matrix})

Note that $Z_{1}$ is the net input to the hidden layer, $H$ is the activation output of the hidden layer, and $Z_{2}$ is the net input to the output layer. (3 pts)

Z_{1} = W_{1} x + b_{1} = (\begin{matrix} 0.1 & - 0.2 & 0.3 & 0.4 \\ 0.5 & - 0.3 & 0.1 & - 0.2 \\ 0.4 & 0.2 & - 0.5 & 0.3 \end{matrix}) (\begin{matrix} 1 \\ - 1 \\ 0.5 \\ 2 \end{matrix}) + (\begin{matrix} 0.1 \\ - 0.1 \\ 0.05 \end{matrix}) = (\begin{matrix} 1.35 \\ 0.35 \\ 0.6 \end{matrix})

H = R e L U (Z_{1}) + s i n (Z_{1}) = (\begin{matrix} 2.325 \\ 0.692 \\ 1.164 \end{matrix})

Z_{2} = W_{2} H + b_{2} = (\begin{matrix} - 0.3 & 0.2 & 0.1 \\ 0.4 & - 0.5 & 0.3 \end{matrix}) (\begin{matrix} 2.33 \\ 0.69 \\ 1.64 \end{matrix}) + (\begin{matrix} 0.05 \\ - 0.05 \end{matrix}) = (\begin{matrix} - 0.3927 \\ 0.8832 \end{matrix})

\hat{y} = σ (Z_{2}) = (\begin{matrix} 0.2181 \\ 0.7819 \end{matrix})

Q2

Derive the gradient of the loss with respect to each parameter $(W_{1}, b_{1}, W_{2}, b_{2})$ in the network and obtain the gradient values using results from the first question. Use matrix calculus to express the gradients. Hint: You can first calculate the error terms $δ_{2}$ and $δ_{1}$ for each layer and use them to express the gradients. (10 pts)

Error term $δ_{2}$

δ_{2} = \hat{y} - y

Gradient of the loss with respect to $W_{2}$

\frac{\partial L o s s}{\partial W_{2}} = \frac{\partial L o s s}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial Z_{2}} \frac{\partial Z_{2}}{\partial W_{2}} = δ_{2} H^{T}

Gradient of the loss with respect to $b_{2}$

\frac{\partial L o s s}{\partial b_{2}} = \frac{\partial L o s s}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial Z_{2}} \frac{\partial Z_{2}}{\partial b_{2}} = δ_{2}

Error term $δ_{1}$

δ_{1} = \frac{\partial L o s s}{\partial H} \frac{\partial H}{\partial Z_{1}} = W_{2}^{T} δ_{2} ⊙ \frac{\partial H}{\partial Z_{1}}

Gradient of the loss with respect to $W_{1}$

\frac{\partial L o s s}{\partial W_{1}} = \frac{\partial L o s s}{\partial H} \frac{\partial H}{\partial Z_{1}} \frac{\partial Z_{1}}{\partial W_{1}} = δ_{1} x^{T}

Gradient of the loss with respect to $b_{1}$

\frac{\partial L o s s}{\partial b_{1}} = \frac{\partial L o s s}{\partial H} \frac{\partial H}{\partial Z_{1}} \frac{\partial Z_{1}}{\partial b_{1}} = δ_{1}

Q3

Suppose the learning rate $α = 0.001$ . Please calculate the updated parameter values after one back propagation process. (4 pts)

The general update rule for each parameter $θ$ with gradient descent is:

θ = θ - α \frac{\partial Loss}{\partial θ}

where $α$ is the learning rate.

Updating $W_{2}$ :

W_{2}^{new} = W_{2} - α \frac{\partial Loss}{\partial W_{2}} = W_{2} - 0.001 δ_{2} H^{T}

Updating $b_{2}$ :

b_{2}^{new} = b_{2} - α \frac{\partial Loss}{\partial b_{2}} = b_{2} - 0.001 δ_{2}

Updating $W_{1}$ :

W_{1}^{new} = W_{1} - α \frac{\partial Loss}{\partial W_{1}} = W_{1} - 0.001 δ_{1} x^{T}

Updating $b_{1}$ :

b_{1}^{new} = b_{1} - α \frac{\partial Loss}{\partial b_{1}} = b_{1} - 0.001 δ_{1}

Algorithm

Tutorial

assignment

Assignment

As-1

As-2

Lab-1

Lab-2

Lab-3

Lab-4

GAMES101

Assignment-1

Assignment-2

Assignment-3

Assignment-4

Lab

Lecture

Peoject

CSCN

Ploidy

ML-As-4 ​

Problem 1: Neural Networks (20 pts) ​

Q1 ​

1. Derive the equations for the forward pass through the network, including both the hidden and output layers. (3 pts) ​

2. Calculate the outputs Z₁, H, Z₂, and ŷ explicitly for a given input x=[1,−1,0.5,2]T and the following initial weights and biases: ​

Q2 ​

Q3 ​

ML-As-4

Problem 1: Neural Networks (20 pts)

Q1

1. Derive the equations for the forward pass through the network, including both the hidden and output layers. (3 pts)

2. Calculate the outputs Z₁, H, Z₂, and ŷ explicitly for a given input $x = [1, - 1, 0.5, 2]^{T}$ and the following initial weights and biases:

Q2

Q3